This page contains the results of CoNGA analyses.
Results in tables may have been filtered to reduce redundancy,
focus on the most important columns, and
limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.
Here we are assessing overall graph-vs-graph correlation by looking at
the shared edges between TCR and GEX neighbor graphs and comparing
that observed number to the number we would expect if the graphs were
completely uncorrelated. Our null model for uncorrelated graphs is to
take the vertices of one graph and randomly renumber them (permute their
labels). We compare the observed overlap to that expected under this null
model by computing a Z-score, either by permuting one of the graph's
vertices many times to get a mean and standard deviation of the overlap
distribution, or, for large graphs where this is time consuming,
by using a regression model for the
standard deviation. The different rows of this table correspond to the
different graph-graph comparisons that we make in the conga graph-vs-graph
analysis: we compare K-nearest-neighbor graphs for GEX and TCR at different
K values ("nbr_frac" aka neighbor-fraction, which reports K as a fraction
of the total number of clonotypes) to each other and to GEX and TCR "cluster"
graphs in which each clonotype is connected to all the other clonotypes with
the same (GEX or TCR) cluster assignment. For two K values (the default),
this gives 2*3=6 comparisons: GEX KNN graph vs TCR KNN graph, GEX cluster
graph vs TCR KNN graph, and GEX KNN graph vs TCR cluster graph, for each of the
two K values (aka nbr_fracs).
The column to look at is *overlap_zscore*. Higher values indicate more
significant GEX/TCR covariation, with "interesting" levels starting around
zscores of 3-5.
Columns in more detail:
graph_overlap_type: KNN ("nbr") or cluster versus KNN ("nbr") or cluster
nbr_frac: the K value for the KNN graph, as a fraction of total clonotypes
overlap: the observed overlap (number of shared edges) between GEX and TCR
graphs
expected_overlap: the expected overlap under a shuffled null model.
overlap_zscore: a Z-score for the observed overlap computed by subtracting
the expected overlap and dividing by the standard deviation estimated from
shuffling.
overlap
expected_overlap
overlap_mean
overlap_sdev
overlap_zscore
overlap_zscore_fitted
overlap_zscore_source
nodes
calculation_time
calculation_time_fitted
gex_edges
tcr_edges
gex_indegree_variance
gex_indegree_skewness
gex_indegree_kurtosis
tcr_indegree_variance
tcr_indegree_skewness
tcr_indegree_kurtosis
indegree_correlation_R
indegree_correlation_P
nbr_frac
graph_overlap_type
22
16.032323
16.14
4.584801
1.278136
2.387600
shuffling
496
0.057114
0.000981
1984
1984
1.004293
1.830023
5.720712
0.419192
1.283199
3.219440
0.022573
0.616000
0.01
gex_nbr_vs_tcr_nbr
211
212.719192
213.18
14.794175
-0.147355
-0.178395
shuffling
496
0.202792
0.014598
1984
26324
1.004293
1.830023
5.720712
0.074457
-0.249930
-1.183560
0.069705
0.121054
0.01
gex_nbr_vs_tcr_cluster
296
267.490909
270.27
18.411874
1.397468
2.589448
shuffling
496
0.247460
0.018544
33102
1984
0.161475
0.059856
-1.029046
0.419192
1.283199
3.219440
-0.013584
0.762823
0.01
gex_cluster_vs_tcr_nbr
2436
2405.850505
2405.19
68.599081
0.449131
0.453443
shuffling
496
0.238049
0.148419
24304
24304
0.770041
1.476363
2.258068
0.264748
1.775346
5.449563
-0.108070
0.016048
0.10
gex_nbr_vs_tcr_nbr
2709
2605.810101
2599.38
58.558822
1.871964
2.025644
shuffling
496
0.242820
0.161326
24304
26324
0.770041
1.476363
2.258068
0.074457
-0.249930
-1.183560
-0.018191
0.686112
0.10
gex_nbr_vs_tcr_cluster
3341
3276.763636
3286.39
71.698242
0.761664
1.159376
shuffling
496
0.284870
0.204938
33102
24304
0.161475
0.059856
-1.029046
0.264748
1.775346
5.449563
-0.058544
0.193033
0.10
gex_cluster_vs_tcr_nbr
graph_vs_graph
Graph vs graph analysis looks for correlation between GEX and TCR space
by finding statistically significant overlap between two similarity graphs,
one defined by GEX similarity and one by TCR sequence similarity.
Overlap is defined one node (clonotype) at a time by looking for overlap
between that node's neighbors in the GEX graph and its neighbors in the
TCR graph. The null model is that the two neighbor sets are chosen
independently at random.
CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where
K = neighborhood size is specified as a fraction of the number of
clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where
each clonotype is connected to all the other clonotypes in the same
(GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN,
GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the
K values (called nbr_fracs short for neighbor fractions).
Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster):
conga_score = P value for GEX/TCR overlap * number of clonotypes
mait_fraction = fraction of the overlap made up of 'invariant' T cells
num_neighbors* = size of neighborhood (K)
cluster_size = size of cluster (for KNN v cluster graph overlaps)
clone_index = 0-index of clonotype in adata object
conga_score
num_neighbors_gex
num_neighbors_tcr
overlap
overlap_corrected
mait_fraction
clone_index
nbr_frac
graph_overlap_type
cluster_size
gex_cluster
tcr_cluster
va
ja
cdr3a
vb
jb
cdr3b
0.135374
49
NaN
12
12
0.0
249
0.10
gex_nbr_vs_tcr_cluster
42.0
0
5
TRAV21*01
TRAJ52*01
CAAPGAGGAGYGKLTF
TRBV20-1*01
TRBJ2-5*01
CSASGTLQETQYF
0.145254
4
4.0
2
2
0.0
181
0.01
gex_nbr_vs_tcr_nbr
NaN
2
1
TRAV18*01
TRAJ50*01
CVLRDRASYNKLMF
TRBV27*01
TRBJ1-5*01
CASSLAGDSNQPQYF
0.163098
49
49.0
13
13
0.0
176
0.10
gex_nbr_vs_tcr_nbr
NaN
6
1
TRAV18*01
TRAJ41*01
CVLGRSSSNSGYALNF
TRBV6-2*01
TRBJ1-1*01
CASRDRILTEAFF
0.190294
49
NaN
16
16
0.0
401
0.10
gex_nbr_vs_tcr_cluster
70.0
6
1
TRAV8-2*01
TRAJ36*01
CAVKQTGVNNLFF
TRBV4-3*01
TRBJ1-2*01
CASSQVYLFGGDDYTF
0.663982
49
NaN
15
15
0.0
214
0.10
gex_nbr_vs_tcr_cluster
70.0
1
1
TRAV2*01
TRAJ4*01
CAVEPGGYDKLIF
TRBV14*01
TRBJ1-5*01
CASSQEGGLNQPQYF
0.663982
49
NaN
15
15
0.0
437
0.10
gex_nbr_vs_tcr_cluster
70.0
4
1
TRAV8-4*01
TRAJ50*01
CAAGPFVTYNKLMF
TRBV6-3*01
TRBJ2-2*01
CASSYSGAAQLFF
tcr_clumping
This table stores the results of the TCR "clumping"
analysis, which looks for neighborhoods in TCR space with more TCRs than
expected by chance under a simple null model of VDJ rearrangement.
For each TCR in the dataset, we count how many TCRs are within a set of
fixed TCRdist radii (defaults: 24,48,72,96), and compare that number
to the expected number given the size of the dataset using the poisson
model. Inspired by the ALICE and TCRnet methods.
Columns:
clump_type='global' unless we are optionally looking for TCR clumps within
the individual GEX clusters
num_nbrs = neighborhood size (number of other TCRs with TCRdist
clump_type
clone_index
nbr_radius
pvalue_adj
num_nbrs
expected_num_nbrs
raw_count
va
ja
cdr3a
vb
jb
cdr3b
clonotype_fdr_value
clumping_group
clusters_gex
clusters_tcr
global
352
24
0.016892
1
0.000009
43.0
TRAV41*01
TRAJ33*01
CAVDSNYQLIW
TRBV20-1*01
TRBJ1-4*01
CSARDRDTNEKLFF
0.008446
1
0
5
global
353
24
0.016892
1
0.000009
43.0
TRAV41*01
TRAJ33*01
CAVDSNYQLIW
TRBV20-1*01
TRBJ1-4*01
CSARDRDTNEKLFF
0.008446
1
2
5
global
352
48
0.226258
1
0.000114
576.0
TRAV41*01
TRAJ33*01
CAVDSNYQLIW
TRBV20-1*01
TRBJ1-4*01
CSARDRDTNEKLFF
0.008446
1
0
5
global
353
48
0.226258
1
0.000114
576.0
TRAV41*01
TRAJ33*01
CAVDSNYQLIW
TRBV20-1*01
TRBJ1-4*01
CSARDRDTNEKLFF
0.008446
1
2
5
global
359
48
0.382189
1
0.000193
973.0
TRAV41*01
TRAJ50*01
CAVYYNKLMF
TRBV4-3*01
TRBJ1-4*01
CASSQDRTGGEKLFF
0.076438
2
1
4
tcr_db_match
This table stores significant matches between
TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv
P values of matches are assigned by turning the raw TCRdist
score into a P value based on a model of the V(D)J rearrangement
process, so matches between TCRs that are very far from germline
(for example) are assigned a higher significance.
Columns:
tcrdist: TCRdist distance between the two TCRs (adata query and db hit)
pvalue_adj: raw P value of the match * num query TCRs * num db TCRs
fdr_value: Benjamini-Hochberg FDR value for match
clone_index: index within adata of the query TCR clonotype
db_index: index of the hit in the database being matched
va,ja,cdr3a,vb,jb,cdr3b
db_XXX: where XXX is a field in the literature database
tcr_graph_vs_gex_features
This table has results from a graph-vs-features analysis in which we
look for genes that are differentially expressed (elevated) in specific
neighborhoods of the TCR neighbor graph. Differential expression is
assessed by a ttest first, for speed, and then
by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value
passes an initial threshold (default is 10* the pvalue threshold).
Each row of the table represents a single significant association, in other
words a neighborhood (defined by the central clonotype index) and a
gene.
The columns are as follows:
ttest_pvalue_adj= ttest_pvalue * number of comparisons
mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons
log2enr = log2 fold change of gene in neighborhood (will be positive)
gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores
tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores
num_fg= the number of clonotypes in the neighborhood (including center)
mean_fg= the mean value of the feature in the neighborhood
mean_bg= the mean value of the feature outside the neighborhood
feature= the name of the gene
mait_fraction= the fraction of the skewed clonotypes that have an invariant
TCR
clone_index= the index in the anndata dataset of the clonotype that is the
center of the neighborhood.
ttest_pvalue_adj
mwu_pvalue_adj
log2enr
gex_cluster
tcr_cluster
feature
mean_fg
mean_bg
num_fg
clone_index
mait_fraction
nbr_frac
graph_type
feature_type
1.176625e-22
1.763414e-51
7.052102
0
5
ENSMMUG00000043894
3.116469
0.150584
43
-1
0.0
0.0
tcr_cluster
gex
4.685757e-06
1.252591e-26
5.259000
0
5
ENSMMUG00000043894
2.253066
0.200828
50
425
0.0
0.1
tcr_nbr
gex
1.087104e-05
4.463841e-25
5.243075
2
5
ENSMMUG00000043894
2.246696
0.201542
50
160
0.0
0.1
tcr_nbr
gex
2.770734e-05
4.032554e-24
5.034332
0
5
ENSMMUG00000043894
2.162849
0.210942
50
29
0.0
0.1
tcr_nbr
gex
2.190950e-05
6.529895e-23
5.141585
0
5
ENSMMUG00000043894
2.206009
0.206103
50
213
0.0
0.1
tcr_nbr
gex
1.084771e-04
4.338609e-21
5.019883
0
5
ENSMMUG00000043894
2.157023
0.211595
50
203
0.0
0.1
tcr_nbr
gex
5.373957e-04
1.955217e-20
4.793971
0
5
ENSMMUG00000043894
2.065626
0.221841
50
111
0.0
0.1
tcr_nbr
gex
1.359607e-03
5.402948e-20
4.630569
5
5
ENSMMUG00000043894
1.999237
0.229284
50
402
0.0
0.1
tcr_nbr
gex
4.017466e-04
4.052746e-19
4.846244
2
5
ENSMMUG00000043894
2.086820
0.219465
50
78
0.0
0.1
tcr_nbr
gex
4.539481e-04
7.759940e-19
4.827226
2
5
ENSMMUG00000043894
2.079112
0.220330
50
140
0.0
0.1
tcr_nbr
gex
4.835479e-04
7.893327e-19
4.818860
2
5
ENSMMUG00000043894
2.075720
0.220710
50
81
0.0
0.1
tcr_nbr
gex
5.037712e-04
8.595118e-19
4.811466
0
5
ENSMMUG00000043894
2.072722
0.221046
50
255
0.0
0.1
tcr_nbr
gex
2.215590e-03
1.207632e-18
4.658240
0
5
ENSMMUG00000043894
2.010492
0.228022
50
52
0.0
0.1
tcr_nbr
gex
2.234398e-03
5.731625e-17
4.656072
0
5
ENSMMUG00000043894
2.009610
0.228121
50
74
0.0
0.1
tcr_nbr
gex
4.915064e-03
8.908936e-17
4.561954
0
5
ENSMMUG00000043894
1.971311
0.232415
50
156
0.0
0.1
tcr_nbr
gex
5.749920e-03
1.880442e-16
4.516485
2
5
ENSMMUG00000043894
1.952793
0.234491
50
23
0.0
0.1
tcr_nbr
gex
6.742742e-03
1.670471e-15
4.563627
0
5
ENSMMUG00000043894
1.971992
0.232339
50
313
0.0
0.1
tcr_nbr
gex
3.656652e-02
4.035340e-15
4.347598
2
5
ENSMMUG00000043894
1.883967
0.242207
50
249
0.0
0.1
tcr_nbr
gex
1.660635e-02
6.252678e-15
4.377533
0
5
ENSMMUG00000043894
1.896169
0.240839
50
8
0.0
0.1
tcr_nbr
gex
3.601341e-02
9.973538e-15
4.315171
0
5
ENSMMUG00000043894
1.870748
0.243689
50
129
0.0
0.1
tcr_nbr
gex
3.009212e-02
2.369439e-14
4.243301
2
5
ENSMMUG00000043894
1.841454
0.246973
50
307
0.0
0.1
tcr_nbr
gex
2.855111e-02
3.333033e-13
4.294776
2
5
ENSMMUG00000043894
1.862435
0.244621
50
352
0.0
0.1
tcr_nbr
gex
2.855111e-02
3.333033e-13
4.294776
2
5
ENSMMUG00000043894
1.862435
0.244621
50
353
0.0
0.1
tcr_nbr
gex
1.454555e-01
6.903124e-13
4.097158
0
5
ENSMMUG00000043894
1.781923
0.253647
50
416
0.0
0.1
tcr_nbr
gex
1.489973e-01
7.006006e-13
4.093609
2
5
ENSMMUG00000043894
1.780479
0.253809
50
439
0.0
0.1
tcr_nbr
gex
9.835122e-02
8.522497e-12
4.212458
0
5
ENSMMUG00000043894
1.828885
0.248382
50
253
0.0
0.1
tcr_nbr
gex
1.709787e-01
1.184556e-11
4.115429
2
5
ENSMMUG00000043894
1.789362
0.252813
50
492
0.0
0.1
tcr_nbr
gex
4.527795e-01
5.852798e-11
3.918442
5
5
ENSMMUG00000043894
1.709270
0.261792
50
59
0.0
0.1
tcr_nbr
gex
2.968522e-01
1.625121e-10
4.092760
0
5
ENSMMUG00000043894
1.780133
0.253847
50
452
0.0
0.1
tcr_nbr
gex
3.073661e-01
3.701793e-10
4.033128
2
5
ENSMMUG00000043894
1.755870
0.256567
50
34
0.0
0.1
tcr_nbr
gex
1.493935e-03
4.165555e-10
3.829200
5
8
ENSMMUG00000056515
2.143218
0.424974
50
176
0.0
0.1
tcr_nbr
gex
6.597906e-01
4.242324e-10
3.959445
5
5
ENSMMUG00000043894
1.725919
0.259925
50
380
0.0
0.1
tcr_nbr
gex
6.687360e-01
4.794901e-10
3.939188
0
5
ENSMMUG00000043894
1.717692
0.260847
50
125
0.0
0.1
tcr_nbr
gex
8.011512e-02
7.309025e-10
3.305261
0
3
ENSMMUG00000056515
1.896046
0.452684
50
274
0.0
0.1
tcr_nbr
gex
5.263138e-01
1.746978e-09
3.875111
0
5
ENSMMUG00000043894
1.691690
0.263762
50
218
0.0
0.1
tcr_nbr
gex
2.786468e+00
4.655902e-09
3.579095
2
5
ENSMMUG00000043894
1.572156
0.277163
50
483
0.0
0.1
tcr_nbr
gex
1.617443e+00
1.885459e-08
3.752812
0
5
ENSMMUG00000043894
1.642174
0.269314
50
191
0.0
0.1
tcr_nbr
gex
4.939308e+00
6.741138e-08
3.557013
2
5
ENSMMUG00000043894
1.563288
0.278157
50
42
0.0
0.1
tcr_nbr
gex
2.996390e+00
2.011981e-07
3.751065
5
5
ENSMMUG00000043894
1.641468
0.269393
50
421
0.0
0.1
tcr_nbr
gex
3.001085e+00
2.482482e-07
3.745496
0
5
ENSMMUG00000043894
1.639217
0.269645
50
18
0.0
0.1
tcr_nbr
gex
4.745798e+00
5.505835e-07
3.573640
0
5
ENSMMUG00000043894
1.569965
0.277409
50
457
0.0
0.1
tcr_nbr
gex
5.163879e+00
5.780375e-07
3.619675
0
5
ENSMMUG00000043894
1.588474
0.275334
50
65
0.0
0.1
tcr_nbr
gex
6.319222e-01
4.080144e-06
3.103270
4
3
ENSMMUG00000056515
1.802207
0.463204
50
259
0.0
0.1
tcr_nbr
gex
7.542347e+00
4.323079e-06
3.615010
0
5
ENSMMUG00000043894
1.586597
0.275544
50
422
0.0
0.1
tcr_nbr
gex
1.574875e+00
4.983006e-06
3.069843
0
0
ENSMMUG00000056515
1.786774
0.464934
50
16
0.0
0.1
tcr_nbr
gex
5.149386e-01
5.745158e-06
3.264272
6
1
ENSMMUG00000056515
1.876927
0.454827
50
148
0.0
0.1
tcr_nbr
gex
1.275155e+00
7.276922e-06
2.994163
0
0
ENSMMUG00000056515
1.751941
0.468839
50
258
0.0
0.1
tcr_nbr
gex
8.582253e-01
1.539187e-05
3.157389
4
3
ENSMMUG00000056515
1.827253
0.460396
50
208
0.0
0.1
tcr_nbr
gex
6.437223e-01
2.992497e-05
3.228311
0
0
ENSMMUG00000056515
1.860185
0.456704
50
107
0.0
0.1
tcr_nbr
gex
4.189580e+00
3.660095e-05
2.818277
0
3
ENSMMUG00000056515
1.671615
0.477845
50
275
0.0
0.1
tcr_nbr
gex
Omitted 13 lines
tcr_graph_vs_gex_features_plot
This plot summarizes the results of a graph
versus features analysis by labeling the clonotypes at the center of
each biased neighborhood with the name of the feature biased in that
neighborhood. The feature names are drawn in colored boxes whose
color is determined by the strength and direction of the feature score bias
(from bright red for features that are strongly elevated to bright blue
for features that are strongly decreased in the corresponding neighborhoods,
relative to the rest of the dataset).
At most one feature (the top scoring) is shown for each clonotype
(ie, neighborhood). The UMAP xy coordinates for this plot are
stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations
is 'mwu_pvalue_adj'. The threshold score for displaying a feature is
1.0. The feature column is 'feature'. Since
we also run graph-vs-features using "neighbor" graphs that are defined
by clusters, ie where each clonotype is connected to all the other
clonotypes in the same cluster, some biased features may be associated with
a cluster rather than a specific clonotype. Those features are labeled with
a '*' at the end and shown near the centroid of the clonotypes belonging
to that cluster.
Image source: emoryPair11Final_tcr_graph_vs_gex_features_plot.png
tcr_graph_vs_gex_features_panels
Graph-versus-feature analysis was used to identify
a set of GEX features that showed biased distributions
in TCR neighborhoods. This plot shows the distribution of the
top-scoring GEX features on the TCR
UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie
Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons).
At most 3 features from clonotype neighbhorhoods
in each (GEX,TCR) cluster pair are shown. The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Points are plotted in order of increasing feature score.
Image source: emoryPair11Final_tcr_graph_vs_gex_features_panels.png
tcr_genes_vs_gex_features
This table has results from a graph-vs-features analysis in which we
look for genes that are differentially expressed (elevated) in specific
neighborhoods of the TCR neighbor graph. Differential expression is
assessed by a ttest first, for speed, and then
by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value
passes an initial threshold (default is 10* the pvalue threshold).
Each row of the table represents a single significant association, in other
words a neighborhood (defined by the central clonotype index) and a
gene.
The columns are as follows:
ttest_pvalue_adj= ttest_pvalue * number of comparisons
mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons
log2enr = log2 fold change of gene in neighborhood (will be positive)
gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores
tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores
num_fg= the number of clonotypes in the neighborhood (including center)
mean_fg= the mean value of the feature in the neighborhood
mean_bg= the mean value of the feature outside the neighborhood
feature= the name of the gene
mait_fraction= the fraction of the skewed clonotypes that have an invariant
TCR
clone_index= the index in the anndata dataset of the clonotype that is the
center of the neighborhood.
In this analysis the TCR graph is defined by
connecting all clonotypes that have the same VA/JA/VB/JB-gene segment
(it's run four times, once with each gene segment type)
ttest_pvalue_adj
mwu_pvalue_adj
log2enr
gex_cluster
tcr_cluster
feature
mean_fg
mean_bg
num_fg
clone_index
mait_fraction
gene_segment
graph_type
feature_type
8.385425e-01
4.129355e-83
10.102635
1
2
ENSMMUG00000056431
1.723031
0.004176
12
-1
0.0
TRAV35
tcr_genes
gex
1.010029e-23
3.634089e-79
10.344209
0
3
ENSMMUG00000063185
3.494982
0.024281
31
-1
0.0
TRBV4-2
tcr_genes
gex
5.384392e-01
1.700150e-72
9.028595
5
2
ENSMMUG00000059325
1.625720
0.007786
16
-1
0.0
TRAV25
tcr_genes
gex
9.093194e-03
3.825079e-70
9.144055
5
1
ENSMMUG00000062211
2.458646
0.018717
17
-1
0.0
TRBV12-2
tcr_genes
gex
2.697869e+00
4.603826e-60
8.295272
7
7
ENSMMUG00000056910
1.908267
0.018111
10
-1
0.0
TRAV16
tcr_genes
gex
3.908276e-02
5.865665e-60
8.879587
5
1
ENSMMUG00000060662
2.512758
0.023789
10
-1
0.0
TRAV8-7
tcr_genes
gex
1.192059e-21
3.603925e-56
8.117888
0
0
ENSMMUG00000062085
3.030515
0.068540
34
-1
0.0
TRBV4-3
tcr_genes
gex
1.339802e-41
8.405509e-56
7.358897
0
5
ENSMMUG00000043894
3.267886
0.143109
42
-1
0.0
TRBV20-1
tcr_genes
gex
6.785412e-01
3.781719e-54
9.250202
4
4
ENSMMUG00000054409
2.143225
0.012284
13
-1
0.0
TRAV6
tcr_genes
gex
5.808227e-01
4.596215e-54
7.211970
1
1
ENSMMUG00000061081
1.064753
0.012735
19
-1
0.0
TRAV8-2
tcr_genes
gex
4.261443e-06
5.419144e-52
8.408435
5
0
ENSMMUG00000065017
2.439549
0.030343
20
-1
0.0
TRAV12-1
tcr_genes
gex
2.713978e+00
2.841522e-25
6.809848
5
1
ENSMMUG00000061119
1.832875
0.045748
16
-1
0.0
TRAV18
tcr_genes
gex
1.453674e-06
3.751946e-20
6.988104
0
0
ENSMMUG00000051385
3.012414
0.141774
11
-1
0.0
TRBV7-4
tcr_genes
gex
7.944003e-10
6.060995e-19
5.042675
0
0
ENSMMUG00000056515
2.958106
0.440856
31
-1
0.0
TRBV6-2
tcr_genes
gex
2.889650e-14
2.144523e-18
5.463341
0
1
ENSMMUG00000056515
3.262967
0.450771
26
-1
0.0
TRBV6-3
tcr_genes
gex
3.386548e-01
1.424218e-06
4.432549
6
2
ENSMMUG00000043894
2.360039
0.367535
10
-1
0.0
TRBV19
tcr_genes
gex
1.412165e+00
1.704765e-06
5.041569
1
4
ENSMMUG00000043894
2.740171
0.364602
9
-1
0.0
TRBV21-1
tcr_genes
gex
2.460468e-03
2.036946e-04
4.364812
4
1
ENSMMUG00000056515
2.766862
0.544415
12
-1
0.0
TRBV10-2
tcr_genes
gex
3.417063e-03
1.467406e-01
1.124226
4
7
ENSMMUG00000059019
1.682992
1.101962
46
-1
0.0
TRBJ1-2
tcr_genes
gex
2.055242e-01
4.459292e-01
2.586831
7
0
CST3
0.548076
0.114664
5
-1
0.0
TRBV6-8
tcr_genes
gex
tcr_genes_vs_gex_features_panels
Graph-versus-feature analysis was used to identify
a set of GEX features that showed biased distributions
in TCR neighborhoods. This plot shows the distribution of the
top-scoring GEX features on the TCR
UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie
Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons).
At most 3 features from clonotype neighbhorhoods
in each (GEX,TCR) cluster pair are shown. The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Points are plotted in order of increasing feature score.
Image source: emoryPair11Final_tcr_genes_vs_gex_features_panels.png
gex_graph_vs_tcr_features
This table has results from a graph-vs-features analysis in which we
look at the distribution of a set of TCR-defined features over the GEX
neighbor graph. We look for neighborhoods in the graph that have biased
score distributions, as assessed by a ttest first, for speed, and then
by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value
passes an initial threshold (default is 10* the pvalue threshold).
Each row of the table represents a single significant association, in other
words a neighborhood (defined by the central clonotype index) and a
tcr feature.
The columns are as follows:
ttest_pvalue_adj= ttest_pvalue * number of comparisons
ttest_stat= ttest statistic (sign indicates where feature is up or down)
mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons
gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores
tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores
num_fg= the number of clonotypes in the neighborhood (including center)
mean_fg= the mean value of the feature in the neighborhood
mean_bg= the mean value of the feature outside the neighborhood
feature= the name of the TCR score
mait_fraction= the fraction of the skewed clonotypes that have an invariant
TCR
clone_index= the index in the anndata dataset of the clonotype that is the
center of the neighborhood.
nbr_frac
graph_type
ttest_pvalue_adj
ttest_stat
mwu_pvalue_adj
gex_cluster
tcr_cluster
num_fg
mean_fg
mean_bg
feature
mait_fraction
clone_index
feature_type
0.0
gex_cluster
0.840617
-3.543602
0.264194
2.0
5.0
70.0
-1.803266
-1.185967
imhc
0.0
-1.0
tcr
0.0
gex_cluster
1.525273
-3.591452
0.308921
6.0
4.0
28.0
0.776978
1.038857
disorder
0.0
-1.0
tcr
0.0
gex_cluster
0.655427
3.900773
0.309534
6.0
1.0
28.0
1.971037
1.893739
beta
0.0
-1.0
tcr
0.1
gex_nbr
0.330157
-5.014420
0.398259
7.0
1.0
50.0
-0.322845
-0.002894
cd8
0.0
269.0
tcr
0.0
gex_cluster
3.135113
-3.330630
0.603744
6.0
4.0
28.0
-5.775247
-5.491135
mjenergy
0.0
-1.0
tcr
0.1
gex_nbr
0.810211
-4.787417
0.691913
0.0
5.0
50.0
-0.327341
-0.002390
cd8
0.0
97.0
tcr
0.0
gex_cluster
0.936262
3.545102
0.708752
3.0
3.0
54.0
-0.131740
-0.328623
kf5
0.0
-1.0
tcr
0.0
gex_cluster
0.863751
3.597980
0.736687
4.0
3.0
52.0
-0.607811
-1.351001
imhc
0.0
-1.0
tcr
0.1
gex_nbr
0.542962
-4.875506
0.795582
0.0
5.0
50.0
-0.310133
-0.004319
cd8
0.0
473.0
tcr
0.1
gex_nbr
0.280316
-4.979091
4.841975
0.0
5.0
50.0
-0.265452
-0.009328
cd8
0.0
444.0
tcr
gex_graph_vs_tcr_features_plot
This plot summarizes the results of a graph
versus features analysis by labeling the clonotypes at the center of
each biased neighborhood with the name of the feature biased in that
neighborhood. The feature names are drawn in colored boxes whose
color is determined by the strength and direction of the feature score bias
(from bright red for features that are strongly elevated to bright blue
for features that are strongly decreased in the corresponding neighborhoods,
relative to the rest of the dataset).
At most one feature (the top scoring) is shown for each clonotype
(ie, neighborhood). The UMAP xy coordinates for this plot are
stored in adata.obsm['X_gex_2d']. The score used for ranking correlations
is 'mwu_pvalue_adj'. The threshold score for displaying a feature is
1.0. The feature column is 'feature'. Since
we also run graph-vs-features using "neighbor" graphs that are defined
by clusters, ie where each clonotype is connected to all the other
clonotypes in the same cluster, some biased features may be associated with
a cluster rather than a specific clonotype. Those features are labeled with
a '*' at the end and shown near the centroid of the clonotypes belonging
to that cluster.
Image source: emoryPair11Final_gex_graph_vs_tcr_features_plot.png
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
GEX landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_gex' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are GEX clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=49 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie GEX features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the GEX features).
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
TCR landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_tcr' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are TCR clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=49 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie TCR features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the TCR features).
Summary figure for the graph-vs-graph and
graph-vs-features analyses.
Image source: emoryPair11Final_graph_vs_summary.png
gex_clusters_tcrdist_trees
These are TCRdist hierarchical clustering trees
for the GEX clusters (cluster assignments stored in
adata.obs['clusters_gex']). The trees are colored by CoNGA score
with a color score range of 4.96e+00 (blue) to 4.96e-09 (red).
For coloring, CoNGA scores are log-transformed, negated, and square-rooted
(with an offset in there, too, roughly speaking).
Image source: emoryPair11Final_gex_clusters_tcrdist_trees.png
conga_threshold_tcrdist_tree
This is a TCRdist hierarchical clustering tree
for the clonotypes with CoNGA score less than 10.0.
The tree is colored by CoNGA score
with a color score range of 1.00e+01 (blue) to 1.00e-08 (red).
For coloring, CoNGA scores are log-transformed, negated, and square-rooted
(with an offset in there, too, roughly speaking).
Image source: emoryPair11Final_conga_threshold_tcrdist_tree.png
hotspot_features
Find GEX (TCR) features that show a biased
distribution across the TCR (GEX) neighbor graph,
using a simplified version of the Hotspot method
from the Yosef lab.
DeTomaso, D., & Yosef, N. (2021).
"Hotspot identifies informative gene modules across modalities
of single-cell genomics."
Cell Systems, 12(5), 446–456.e9.
PMID:33951459
Columns:
Z: HotSpot Z statistic
pvalue_adj: Raw P value times the number of tests (crude Bonferroni
correction)
nbr_frac: The K NN nbr fraction used for the neighbor graph construction
(nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)
Z
pvalue_adj
feature
feature_type
nbr_frac
45.040090
0.000000e+00
ENSMMUG00000043894
gex
0.10
29.885820
2.323074e-192
ENSMMUG00000056515
gex
0.10
21.273270
1.550664e-96
ENSMMUG00000043894
gex
0.01
20.239359
3.404804e-87
ENSMMUG00000063185
gex
0.10
18.675301
6.073724e-74
ENSMMUG00000062085
gex
0.10
18.015172
1.143937e-68
ENSMMUG00000059325
gex
0.10
16.124270
1.330549e-54
ENSMMUG00000059325
gex
0.01
16.087118
2.425968e-54
ENSMMUG00000056431
gex
0.10
15.807421
2.135876e-52
ENSMMUG00000054409
gex
0.01
15.736950
6.519450e-52
ENSMMUG00000065017
gex
0.10
15.475130
3.944108e-50
ENSMMUG00000054409
gex
0.10
15.196598
2.876573e-48
ENSMMUG00000056515
gex
0.01
14.791922
1.275595e-45
ENSMMUG00000061081
gex
0.10
13.468720
1.845581e-37
ENSMMUG00000061119
gex
0.10
13.459147
2.100942e-37
ENSMMUG00000048246
gex
0.01
13.320344
1.361518e-36
ENSMMUG00000061081
gex
0.01
12.588912
1.876685e-32
ENSMMUG00000061119
gex
0.01
12.016336
2.252171e-29
ENSMMUG00000056431
gex
0.01
11.728704
7.015476e-28
ENSMMUG00000065017
gex
0.01
9.651768
3.731394e-18
ENSMMUG00000056910
gex
0.10
9.286605
1.230178e-16
ENSMMUG00000057062
gex
0.10
8.832353
7.915783e-15
ENSMMUG00000056910
gex
0.01
8.756327
1.557847e-14
ENSMMUG00000060662
gex
0.10
8.706822
2.413548e-14
ENSMMUG00000062211
gex
0.10
8.593218
6.531099e-14
PKHD1L1
gex
0.01
8.501788
1.441824e-13
HCN3
gex
0.01
8.336248
5.922382e-13
TMC4
gex
0.01
8.291139
8.662834e-13
ENSMMUG00000003532
gex
0.10
8.064314
5.687864e-12
ENSMMUG00000049680
gex
0.01
7.044762
1.495524e-10
cd8
tcr
0.10
6.916199
3.582881e-08
ENSMMUG00000061255
gex
0.01
6.701394
1.594342e-07
ENSMMUG00000060662
gex
0.01
6.608458
2.999171e-07
ENSMMUG00000063185
gex
0.01
6.568198
3.933145e-07
ENSMMUG00000051385
gex
0.10
6.473282
7.406116e-07
CD8A
gex
0.10
6.037337
1.210023e-05
ENSMMUG00000062085
gex
0.01
5.968999
1.843590e-05
VAT1L
gex
0.01
5.950222
2.068053e-05
EPHA1
gex
0.01
5.839269
4.049191e-05
RBMS2
gex
0.01
5.779115
5.799540e-05
ENSMMUG00000062211
gex
0.01
5.733677
7.589908e-05
ENSMMUG00000051857
gex
0.01
5.651851
1.225866e-04
ENSMMUG00000051385
gex
0.01
5.634538
1.355601e-04
ENSMMUG00000056196
gex
0.01
5.608222
1.578700e-04
TBX19
gex
0.01
5.566041
2.012555e-04
HOPX
gex
0.10
5.499141
2.947459e-04
ENSMMUG00000048246
gex
0.10
5.427430
4.415346e-04
CDK18
gex
0.01
5.073621
3.014405e-03
ENSMMUG00000057062
gex
0.01
4.816631
1.127576e-02
ENSMMUG00000006133
gex
0.01
4.746981
1.594606e-02
ENSMMUG00000064087
gex
0.01
Omitted 6 lines
hotspot_gex_umap
HotSpot analysis (Nir Yosef lab, PMID: 33951459)
was used to identify a set of GEX (TCR) features that showed biased
distributions in TCR (GEX) space. This plot shows the distribution of the
top-scoring HotSpot features on the GEX
UMAP 2D landscape. The features are ranked by adjusted P value
(raw P value * number of comparisons). The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Features are filtered based on correlation coefficient to reduce
redundancy: if a feature has a correlation of >= 0.9
(the max_feature_correlation argument to conga.plotting.plot_hotspot_umap)
to a previously plotted feature, that feature is skipped.
Points are plotted in order of increasing feature score
Image source: emoryPair11Final_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png
hotspot_gex_clustermap
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
GEX landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_gex' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are GEX clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=49 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie GEX features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the GEX features).
HotSpot analysis (Nir Yosef lab, PMID: 33951459)
was used to identify a set of GEX (TCR) features that showed biased
distributions in TCR (GEX) space. This plot shows the distribution of the
top-scoring HotSpot features on the TCR
UMAP 2D landscape. The features are ranked by adjusted P value
(raw P value * number of comparisons). The raw scores for each feature
are averaged over the K nearest neighbors (K is indicated in the lower
right corner of each panel) for each clonotype. The min and max
nbr-averaged scores are shown in the upper corners of each panel.
Features are filtered based on correlation coefficient to reduce
redundancy: if a feature has a correlation of >= 0.9
(the max_feature_correlation argument to conga.plotting.plot_hotspot_umap)
to a previously plotted feature, that feature is skipped.
Points are plotted in order of increasing feature score
Image source: emoryPair11Final_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png
hotspot_tcr_clustermap
This plot shows the distribution of significant
features from graph-vs-features or HotSpot analysis plotted across the
TCR landscape. Rows are features and columns are
individual clonotypes. Columns are ordered by hierarchical clustering
(if a dendrogram is present above the heatmap) or by a 1D UMAP projection
(used for very large datasets or if 'X_pca_tcr' is not present in
adata.obsm_keys()). Rows are ordered by hierarchical clustering with
a correlation metric.
The row colors to the left of the heatmap show the feature type
(blue=TCR, orange=GEX). The row colors to the left of those
indicate the strength of the graph-vs-feature correlation
(also included in the feature labels to the right of the heatmap;
keep in mind that highly significant P values for some features may shift
the colorscale so everything else looks dark blue).
The column colors above the heatmap are TCR clusters
(and TCR V/J genes if plotting against the TCR landscape). The text
above the column colors provides more info.
Feature scores are Z-score normalized and then averaged over the
K=49 nearest neighbors (0 means no nbr-averaging).
The 'coolwarm' colormap is centered at Z=0.
Since features of the same type (GEX or TCR) as the landscape and
neighbor graph (ie TCR features) are more highly
correlated over graph neighborhoods, their neighbor-averaged scores
will show more extreme variation. For this reason, the nbr-averaged
scores for these features from the same modality as the landscape
itself are downscaled by a factor of
rescale_factor_for_self_features=0.33.
The colormap in the top left is for the Z-score normalized,
neighbor-averaged scores (multiply by 3.03
to get the color scores for the TCR features).